Large-Scale Support Vector Machines: Algorithms and Theory
نویسنده
چکیده
Support vector machines (SVMs) are a very popular method for binary classification. Traditional training algorithms for SVMs, such as chunking and SMO, scale superlinearly with the number of examples, which quickly becomes infeasible for large training sets. Since it has been commonly observed that dataset sizes have been growing steadily larger over the past few years, this necessitates the development of training algorithms that scale at worst linearly with the number of examples. We survey work on SVM training methods that target this large-scale learning regime. Most of these algorithms use either (1) variants of primal stochastic gradient descent (SGD), or (2) quadratic programming in the dual. For (1), we discuss why SGD generalizes well even though it is poor at optimization, and describe algorithms such as Pegasos and FOLOS that extend basic SGD to quickly solve the SVM problem. For (2), we survey recent methods such as dual coordinate-descent and BMRM, which have proven competitive with the SGD-based solvers. We also discuss the recent work of [Shalev-Shwartz and Srebro, 2008] that concludes that training time for SVMs should actually decrease as the training set size increases, and explain why SGD-based algorithms are able to satisfy this desideratum. 1. WHY LARGE-SCALE LEARNING? Supervised learning involves analyzing a given set of labelled observations (the training set) so as to predict the labels of unlabelled future data (the test set). Specifically, the goal is to learn some function that describes the relationship between observations and their labels. Archetypal examples of supervised learning include recognizing handwritten digits and spam classification. One parameter of interest for a supervised learning problem is the size of the training set. We call a learning problem large-scale if its training set cannot be stored in a modern computer’s memory [Langford, 2008]. A deeper definition of large-scale learning is that it consists of problems where the main computational constraint is the amount of time available, rather than the number of examples [Bottou and Bousquet, 2007]. A large training set poses a challenge for the computational complexity of a learning algorithm: in order for algorithms to be feasible on such datasets, they must scale at worst linearly with the number of examples. Most learning problems that have been studied thus far are mediumscale, in that they assume that the training set can be stored in memory and repeatedly scanned. However, with the growing volume of data in the last few years, we have started to see problems that are large-scale. An example of this is ad-click data for search engines. When most modern search engines produce results for a query, they also display a number of (hopefully) relevant ads. When the user clicks on an ad, the search engine receives some commission from the ad sponsor. This means that to price the ad reasonably, the search company needs to have a good estimate of whether, for a given query, an ad is likely to be clicked or not. One way to formulate this as a learning problem is to have training examples consisting of an ad and its corresponding search query, and a label denoting whether or not the ad was clicked. We wish to learn a classifier that tells us whether a given ad is likely to be clicked if it were generated for a given query. Given the volume of queries search engines process (Google processes around 7.5 billion queries a month [Searchenginewatch.com, 2008]), the potential size of such a training set can far exceed the memory capacity of a modern system. Conventional learning algorithms cannot handle such problems, because we can no longer store and have ready access to the data in memory. This necessitates the development of new algorithms, and a careful study of the challenges posed by this scale of problem. An extra motivation for studying such algorithms is that they can also be applied to medium-scale problems, which are still of immediate practical interest currently. Our focus in this document is how a support vector machine (SVM), a popular method for binary classification that is based on strong theory and enjoys good practical performance, can be scaled to work with large training sets. There have been two strands of work in the literature on this topic. The first is a theoretical analysis of the problem, in an attempt to understand how learning algorithms need to be changed to adapt to a large-scale setting. The other is the design of training algorithms for SVMs that work well for these large datasets, including the recent Pegasos solver [Shalev-Shwartz et al., 2007], which leverages the theoretical results on large-scale learning to actually decrease its runtime when given more examples. We discuss both strands, and attempt to identify the limitations of current solvers. First, let us define more precisely the large-scale setting that we are considering, and describe some general approaches to solving such problems. 1.1 Batch and online algorithms When we discuss supervised learning problems with a large training set, we are implicitly assuming that the learning is done in the batch framework. We do not focus on the online learning scenario, which consists of a potentially infinite stream of training examples presented one at a time, although such a setting can certainly be thought of as large-scale learning. However, it is possible for an online algorithm to solve a batch problem, and in fact this might be desirable in the large-scale setting, as we discuss below. More generally, an intermediate between batch and online algorithms is what we call an online-style algorithm. This is an algorithm that assumes a batch setting, but only uses a sublinear amount of memory, and whose computational complexity scales only sublinearly with the number of examples. This precludes batch algorithms that repeatedly process the training set at each iteration. A standard online algorithm can be converted into an online-style algorithm
منابع مشابه
کاربرد الگوریتمهای دادهکاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد
Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...
متن کاملApplication of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملA New Play-off Approach in League Championship Algorithm for Solving Large-Scale Support Vector Machine Problems
There are many numerous methods for solving large-scale problems in which some of them are very flexible and efficient in both linear and non-linear cases. League championship algorithm is such algorithm which may be used in the mentioned problems. In the current paper, a new play-off approach will be adapted on league championship algorithm for solving large-scale problems. The proposed algori...
متن کاملLiquid-liquid equilibrium data prediction using large margin nearest neighbor
Guanidine hydrochloride has been widely used in the initial recovery steps of active protein from the inclusion bodies in aqueous two-phase system (ATPS). The knowledge of the guanidine hydrochloride effects on the liquid-liquid equilibrium (LLE) phase diagram behavior is still inadequate and no comprehensive theory exists for the prediction of the experimental trends. Therefore the effect the ...
متن کاملA QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES
Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only considers both accuracy and generalization criteria in a single objective fu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009